Uncertainty Decoding with Adaptive Sampling for Noise Robust DNN-Based Acoustic Modeling
نویسندگان
چکیده
Although deep neural network (DNN) based acoustic models have obtained remarkable results, the automatic speech recognition (ASR) performance still remains low in noise and reverberant conditions. To address this issue, a speech enhancement front-end is often used before recognition to reduce noise. However, the front-end cannot fully suppress noise and often introduces artifacts that are limiting the ASR performance improvement. Uncertainty decoding has been proposed to better interconnect the speech enhancement front-end and ASR back-end and mitigate the mismatch caused by residual noise and artifacts. By considering features as distributions instead of point estimates, the uncertainty decoding approach modifies the conventional decoding rules to account for the uncertainty emanating from the speech enhancement. Although the concept of uncertainty decoding has been investigated for DNN acoustic models recently, finding efficient ways to incorporate distribution of the enhanced features within a DNN acoustic model still requires further investigations. In this paper, we propose to parameterize the distribution of the enhanced feature and estimate the parameters by backpropagation using an unsupervised adaptation scheme. We demonstrate the effectiveness of the proposed approach on real audio data of the CHiME3 dataset.
منابع مشابه
Uncertainty decoding for DNN-HMM hybrid systems based on numerical sampling
In this article, we propose an uncertainty decoding scheme for DNN-HMM hybrid systems based on numerical sampling. A finite set of samples is drawn from the estimated probability distribution of the acoustic features and subsequently passed through feature transformations/extensions and the deep neural network (DNN). Then, the nonlinearly-transformed feature samples are averaged at the output o...
متن کاملUncertainty propagation through deep neural networks
In order to improve the ASR performance in noisy environments, distorted speech is typically pre-processed by a speech enhancement algorithm, which usually results in a speech estimate containing residual noise and distortion. We may also have some measures of uncertainty or variance of the estimate. Uncertainty decoding is a framework that utilizes this knowledge of uncertainty in the input fe...
متن کاملUncertainty training and decoding methods of deep neural networks based on stochastic representation of enhanced features
Speech enhancement is an important front-end technique to improve automatic speech recognition (ASR) in noisy environments. However, the wrong noise suppression of speech enhancement often causes additional distortions in speech signals, which degrades the ASR performance. To compensate the distortions, ASR needs to consider the uncertainty of enhanced features, which can be achieved by using t...
متن کاملAn improved uncertainty decoding scheme with weighted samples for DNN-HMM hybrid systems
In this paper, we advance a recently-proposed uncertainty decoding scheme for DNN-HMM (deep neural network hidden Markov model) hybrid systems. This numerical sampling concept averages DNN outputs produced by a finite set of feature samples (drawn from a probabilistic distortion model) to approximate the posterior likelihoods of the context-dependent HMM states. As main innovation, we propose a...
متن کاملJoint Uncertainty Decoding for Robust Large Vocabulary Speech Recognition
Standard techniques to increase automatic speech recognition noise robustness typically assume recognition models are clean trained. This “clean” training data may in fact not be clean at all, but may contain channel variations, varying noise conditions, as well as different speakers. Hence rather than considering noise robustness techniques as compensating clean acoustic models for environment...
متن کامل